22 research outputs found

    Recovering networks from distance data

    Get PDF
    A fully probabilistic approach to reconstructing Gaussian graphical models from distance data is presented. The main idea is to extend the usual central Wishart model in traditional methods to using a likelihood depending only on pairwise distances, thus being independent of geometric assumptions about the underlying Euclidean space. This extension has two advantages: the model becomes invariant against potential bias terms in the measurements, and can be used in situations which on input use a kernel- or distance matrix, without requiring direct access to the underlying vectors. The latter aspect opens up a huge new application field for Gaussian graphical models, as network reconstruction is now possible from any Mercer kernel, be it on graphs, strings, probabilities or more complex objects. We combine this likelihood with a suitable prior to enable Bayesian network inference. We present an efficient MCMC sampler for this model and discuss the estimation of module networks. Experiments depict the high quality and usefulness of the inferred network

    Probabilistic Clustering of Time-Evolving Distance Data

    Full text link
    We present a novel probabilistic clustering model for objects that are represented via pairwise distances and observed at different time points. The proposed method utilizes the information given by adjacent time points to find the underlying cluster structure and obtain a smooth cluster evolution. This approach allows the number of objects and clusters to differ at every time point, and no identification on the identities of the objects is needed. Further, the model does not require the number of clusters being specified in advance -- they are instead determined automatically using a Dirichlet process prior. We validate our model on synthetic data showing that the proposed method is more accurate than state-of-the-art clustering methods. Finally, we use our dynamic clustering model to analyze and illustrate the evolution of brain cancer patients over time

    Full-length haplotype reconstruction to infer the structure of heterogeneous virus populations

    Get PDF
    Next-generation sequencing (NGS) technologies enable new insights into the diversity of virus populations within their hosts. Diversity estimation is currently restricted to single-nucleotide variants or to local fragments of no more than a few hundred nucleotides defined by the length of sequence reads. To study complex heterogeneous virus populations comprehensively, novel methods are required that allow for complete reconstruction of the individual viral haplotypes. Here, we show that assembly of whole viral genomes of ∼8600 nucleotides length is feasible from mixtures of heterogeneous HIV-1 strains derived from defined combinations of cloned virus strains and from clinical samples of an HIV-1 superinfected individual. Haplotype reconstruction was achieved using optimized experimental protocols and computational methods for amplification, sequencing and assembly. We comparatively assessed the performance of the three NGS platforms 454 Life Sciences/Roche, Illumina and Pacific Biosciences for this task. Our results prove and delineate the feasibility of NGS-based full-length viral haplotype reconstruction and provide new tools for studying evolution and pathogenesis of viruse

    Neuromatch Academy: a 3-week, online summer school in computational neuroscience

    Get PDF
    Neuromatch Academy (https://academy.neuromatch.io; (van Viegen et al., 2021)) was designed as an online summer school to cover the basics of computational neuroscience in three weeks. The materials cover dominant and emerging computational neuroscience tools, how they complement one another, and specifically focus on how they can help us to better understand how the brain functions. An original component of the materials is its focus on modeling choices, i.e. how do we choose the right approach, how do we build models, and how can we evaluate models to determine if they provide real (meaningful) insight. This meta-modeling component of the instructional materials asks what questions can be answered by different techniques, and how to apply them meaningfully to get insight about brain function

    Neuromatch Academy: a 3-week, online summer school in computational neuroscience

    Get PDF

    Machine learning methods for HIV/AIDS diagnostics and therapy planning

    Get PDF
    The focus of the thesis is the development and application of Machine Learning methods to the domain of HIV/AIDS diagnostics and therapy planning. The thesis addresses this domain from two different facets. In Facet I, we analyse the genetically-diverse HIV populations present in an infected patient's blood samples. Understanding genetic diversity is crucial for further insights into the viral-host interactions, evolution of drug-resistant viral lineage within an infected host and for personalised medication where drugs are prescribed to a patient based on his/her viral lineage. With the help of recent sequencing technologies, one can generate shorter viral strains called reads from infected blood samples. These reads are made use of in genetic-diversity studies. The puzzle is in matching every read to its parent strain or haplotype, which can be seen as a standard clustering task. Given error-prone reads with limited lengths, the main modelling challenge is that non-overlapping reads do not have any suitable a priori pairwise similarity measure; this leads to a non-standard clustering problem. None of the previous approaches have provided a convincing strategy to solve this issue. In this work we overcome this problem by introducing a propagating Dirichlet Process Mixture Model. In Facet II, we take the first steps to identify similarity patterns between drugs used in HIV/AIDS therapy and active chemical compounds. Currently there exists only a frugal number of anti-HIV drugs available to prepare drug cocktails. When a viral lineage becomes resistant to a particular drug, it tends to show resistance to other drugs in the same drug category, a property called cross-resistance. This situation demands development of newer and resilient drugs and thus, an indepth understanding of similarities between the current drugs and active chemical compounds is necessary. This is done by examining a landscape of active chemical compounds that also contains the drugs. With respect to this, we develop two models: one for Network Inference and another for Automatic Archetype Analysis. For network inference, we present a fully probabilistic approach that infers networks from pairwise Euclidean distances of 'n' objects where the objects are active chemical compounds. For automatic archetype analysis, we develop a sparsity-inducing model based on a Group-Lasso formulation that identifies the representative/archetypal objects given a set of 'n' objects (or active chemical compounds). The model is aided with a well-defined criterion, Bayesian Information Criterion (BIC), that enables automatic model selection
    corecore